Method Overview

  • gc.searchDatasets() -- Returns an iterator over the Datasets on the server.
  • gc.searchVariantSets(datasetId) -- Returns an iterator over the VariantSets fulfilling the specified conditions from the specified Dataset.
  • gc.searchCallSets(variantSetId, name=None) -- Returns an iterator over the CallSets for variantSetId
  • gc.searchVariants(variantSetId, start=None, end=None, referenceName=None, callSetIds=None) -- Returns an iterator over the Variants fulfilling the specified conditions from the specified VariantSet.
  • gc.searchVariantAnnotationSets(variantSetId) -- Returns an iterator over the AnnotationSets fulfilling the specified conditions from the specified Dataset.
  • gc.searchVariantAnnotations(variantAnnotationSetId, referenceName=None, referenceId=None, start=None, end=None, featureIds=[], effects=[]) -- Returns an iterator over the Annotations fulfilling the specified conditions from the specified AnnotationSet.


In [2]:
import pandas as pd

import ga4gh.client
print(ga4gh.__version__)

gc = ga4gh.client.HttpClient("http://localhost:8000")
region_constraints = dict(referenceName="1", start=0, end=int(1e10))


0.1.dev632+ncb43455c1003

Fetch Data Sets


In [3]:
data_sets = pd.DataFrame(ds.toJsonDict() for ds in gc.searchDatasets())
data_sets.head()


Out[3]:
description id name
0 None YnJjYTE brca1

Variant Sets for each Data Set (currently only one)


In [4]:
variant_sets = pd.DataFrame([
    {'data_set_id': ds.id,
     'variant_set_id': vs.id,
     'variant_set_name': vs.name}
    for ds in gc.searchDatasets()
    for vs in gc.searchVariantSets(ds.id)
    ])
variant_sets.head()


Out[4]:
data_set_id variant_set_id variant_set_name
0 YnJjYTE YnJjYTE6V0FTSDdQ WASH7P
1 YnJjYTE YnJjYTE6MWtnUGhhc2Uz 1kgPhase3
2 YnJjYTE YnJjYTE6T1I0Rg OR4F

Call Sets (by variant set)


In [5]:
call_sets = pd.DataFrame([
    {
     'data_set_id': ds.id,
     'variant_set_id': vs.id,
     'variant_set_name': vs.name,
     'call_set_id': cs.id,
     'call_set_name': cs.name,
    }
    for ds in gc.searchDatasets()
    for vs in gc.searchVariantSets(ds.id)
    for cs in gc.searchCallSets(vs.id)
    ])
call_sets.head()


Out[5]:
call_set_id call_set_name data_set_id variant_set_id variant_set_name
0 YnJjYTE6MWtnUGhhc2UzOkhHMDAwOTY HG00096 YnJjYTE YnJjYTE6MWtnUGhhc2Uz 1kgPhase3
1 YnJjYTE6MWtnUGhhc2UzOkhHMDAwOTk HG00099 YnJjYTE YnJjYTE6MWtnUGhhc2Uz 1kgPhase3
2 YnJjYTE6MWtnUGhhc2UzOkhHMDAxMDE HG00101 YnJjYTE YnJjYTE6MWtnUGhhc2Uz 1kgPhase3

Variant Annotation Sets (by variant set)


In [6]:
call_sets = pd.DataFrame([
    {
     'data_set_id': ds.id,
     'variant_set_id': vs.id,
     'variant_set_name': vs.name,
     'variant_annotation_set_id': vas.id,
     'variant_annotation_set_name': vas.name,
    }
    for ds in gc.searchDatasets()
    for vs in gc.searchVariantSets(ds.id)
    for vas in gc.searchVariantAnnotationSets(vs.id)
    ])
call_sets.head()


Out[6]:
data_set_id variant_annotation_set_id variant_annotation_set_name variant_set_id variant_set_name
0 YnJjYTE YnJjYTE6V0FTSDdQOnZhcmlhbnRhbm5vdGF0aW9ucw WASH7P YnJjYTE6V0FTSDdQ WASH7P
1 YnJjYTE YnJjYTE6T1I0Rjp2YXJpYW50YW5ub3RhdGlvbnM OR4F YnJjYTE6T1I0Rg OR4F

Variant Annotations (by variant set and region)


In [7]:
call_sets = pd.DataFrame([
    {
     'data_set_id': ds.id,
     'variant_set_id': vs.id,
     'variant_set_name': vs.name,
     'n_callsets': sum(w for _ in gc.searchCallSets(vs.id)),
     'n_variants': sum(1 for _ in gc.searchVariants(vs.id, **region_constraints)),
     'n_annotation_sets': sum(1 for _ in gc.searchVariantAnnotationSets(vs.id)),
     'n_annotations': sum(1
                          for vas in gc.searchVariantAnnotationSets(vs.id)
                          for _ in gc.searchVariantAnnotations(vas.id, **region_constraints)
                         ),           
    }
    for ds in gc.searchDatasets()
    for vs in gc.searchVariantSets(ds.id)
    for vas in gc.searchVariantAnnotationSets(vs.id)
    ])
call_sets.head()


Out[7]:
data_set_id n_annotation_sets n_annotations n_callsets n_variants variant_set_id variant_set_name
0 YnJjYTE 1 116 0 116 YnJjYTE6V0FTSDdQ WASH7P
1 YnJjYTE 1 840 0 840 YnJjYTE6T1I0Rg OR4F

In [ ]:


In [ ]: